-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vectorize min/max_element
using SSE4.1 for floats
#3928
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
and remove extra coverage
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Keep the fix within minmax_element.cpp, though.
I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
@@ -1089,6 +1351,16 @@ const void* __stdcall __std_min_element_8( | |||
return _Minmax_element<_Mode_min, _Minmax_traits_8>(_First, _Last, _Signed); | |||
} | |||
|
|||
const void* __stdcall __std_min_element_f( | |||
const void* const _First, const void* const _Last, const bool _Unused) noexcept { | |||
return _Minmax_element<_Mode_min, _Minmax_traits_f>(_First, _Last, _Unused); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why we have these _Unused
arguments which are, in fact, used. This seems confusing enough to warrant an explanation in the source. I see that _Minmax_element
needs a third argument, but I don't get why we need to pass a false
in from the header-only code, nor why we need to flow that argument into _Minmax_element
instead of passing a literal false
here to enable constant propagation.
Is this intended for future use? This is always-statically-linked code so there's no need to allow for old binaries to work with new headers AFAICS.
EDIT: Ah, I see. A comment here directing the reader to the explanation on _Minmax_element
would be nice, but totally not worth resetting testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the same journey of discovery, I can add a small comment in a followup PR.
Thanks for maximizing the performance and minimizing the time taken for these algorithms with everyone's favorite types! 🚀 😻 🎁 |
Re: handling NaNs. This change has broken code which does pass NAN in the data. Applications can use this to indicate "missing" data. In the past, they were skipped. Now, it is a crash. We can craft a custom Compare function for std::min_element or for std::max_element but not for std::minmax_element. |
Can you make a more detailed report, with your data, expected results, actual results. |
@jschroedl , I've managed to recreate this on randomized test. As a workaroung you can define Note tha mixing NaN with few other numbers in input is an undefined behavior, as such input violates strict weak ordering. |
Per [alg.min.max], it seems that processing NaN is UB for |
Isn't [alg.sorting]/3 covering this on a higher level? |
Or you can define |
Thanks for correction, and it's sure that UB is also present |
The scenario of having NaN in collections of double is not so far-fetched as it can be good indications during telemetry or data collection as a sensor going offline, for example. I'm sure you can imagine other scenarios. We have used NaN to indicate 'missing' data for many many years and portably (mac, win) have been able to use these algorithms and have NaN skipped so our use of this is historical. Clearly, we'll need to revisit this usage. As for the crash, a simple test case I have is the following will crash now whereas prior it would return 0, 99 for min, max.
Thank you for the workaround flag, I will investigate using that. |
Thanks for providing the test case. Note that your repro has UB, as @frederick-vs-ja and me have discussed just above. The non-UB equivalent repro looks like this to me: #include <iostream>
#include <vector>
#include <limits>
int main()
{
std::vector<double> data{ 1, 1, NAN, 1, 1 };
double dMin = *std::min_element(begin(data), end(data));
std::cout << "\nMin is " << dMin;
double dMax = *std::max_element(begin(data), end(data));
std::cout << "\nMax is " << dMax;
} I intend to fix this to output the following:
But I do not intend to fix your example, it may or may not be fixed. As a proper fix on your side, I suggest having hand-written loops instead of using STL algorithms to avoid undefined behavior. |
@jschroedl , the issue you reported resloved as However it made me to look into |
@AlexGuteniev, what makes things worse with crashes on NaNs is that it changes existing behavior (<= MSVC 17.9) and causes a crash that can't even be caught via a |
I admit I didn't expect crashes on NANs when doing this change, only wrong results. Generally, optimizations indeed shouldn't break existing code, especially into a runtime crash. This reminds me of a precedent when on one of platforms The case with So I generally don't think that the breakage of some not well defined code necessarily justifies the revert of the optimization. Still I think in some cases breaking of not well defined code may be fixed:
For 1 I don't have figures. So far we have at least few complains. Is this massive? Probably not yet. What currently prevents me from submitting a fix which keeps the optimization, but makes NAN values don't crash it is:
|
Since we are working with the C++17 standard at the moment, I looked up the C++17 standard-definition of max_element (§ 28.7.8):
This of course has the consequence, that if the first element of the range is NaN, a valid max cannot be found. But it is nevertheless well-defined behavior and should not lead to unexpected crashes. I'm not sure if C++20 allows to change that by requiring the indirect_strict_weak_order concept. But as I see it, C++17 and earlier does not require us to expect undefined behavior. |
It is undefined behavior according to C++17 you linked as well. |
Resolves #2439
Enable for all modes. except
/fp:except
. Not sure yet about that one.The
/fp:strict
and/fp:precise
modes are fine, since the ultimate comparison is done via_mm_cmpeq_ps
/_mm_cmpgt_ps
, which are IEEE 754 compliant, and compare positive/negative zeros correctly as equal.We still have unpredicted behavior on NaNs. @StephanTLavavej thinks we should not care about that. I agree:
comp
shall induce a strict weak ordering on the values", and with having both NaNs and non-NaNs it does not hold true_mm_cmpeq_ps
/_mm_cmpeq_pd
against self, and still win over the scalar implementationI didn't use Uniform Distribution to directly form values, instead uses values from a pool that also has +/- infinity, and +/- zero -- thanks @statementreply for pointing this out when the test was updated in a preceding change.
No AVX yet. I think will do every SSE4,2 min / max, and just then look into AVX.
🏁 Perf benchmark
Without vectorization:
With vectorization:
Full results
Without vectorization:
With vectorization: